Add leadership "domains" so multiple Rivers can operate in one schema#1113
Add leadership "domains" so multiple Rivers can operate in one schema#1113
Conversation
|
@bgentry This works (I think), but still needs some testing added. Wanted to get your general reaction before taking it all the way. |
0819eec to
1c32845
Compare
|
Okay, the tests for this should be in good shape now. |
| @@ -0,0 +1,3 @@ | |||
| ALTER TABLE /* TEMPLATE: schema */river_leader | |||
There was a problem hiding this comment.
Since we're adding a migration here, before shipping I'd also go through the existing "needs migration" issues and pull them in.
One nice thing about this migration as currently written is that if you did not run it, it wouldn't be a problem as long as you didn't try to use LeaderDomain. So it's safer than your average migration.
ec7e682 to
f41afcc
Compare
| // because the default client(s) will infringe on the domains of the | ||
| // non-default one(s). | ||
| // | ||
| // Certain maintenance services that aren't queue-related like the indexer |
There was a problem hiding this comment.
| // Certain maintenance services that aren't queue-related like the indexer | |
| // Certain maintenance services that aren't queue-related like the reindexer |
| // In general, most River users should not need LeaderDomain, and when | ||
| // running multiple Rivers may want to consider using multiple databases and | ||
| // multiple schemas instead. |
There was a problem hiding this comment.
I would probably try to get some version of this warning in the first paragraph to dissuade people from using this feature. Could describe it as "an advanced option" or something like that. I would just like to ensure people don't start using this setting automatically just bc it's there because it has some major footguns.
There was a problem hiding this comment.
Yep, great point. Fixed.
| // A warning though that River *does not protect against configuration | ||
| // mistakes*. If client1 on domain1 is configured for queue_a and queue_b, | ||
| // and client2 on domain2 is *also* configured for queue_a and queue_b, then | ||
| // both clients may end up running maintenance services on the same queues | ||
| // at the same time. It's the caller's responsibility to ensure that doesn't | ||
| // happen. |
There was a problem hiding this comment.
Gotta say this definitely feels dangerous. I'm wondering what else can break if we allow for breaking one of the main promises of leader election (there can only be one), including Pro features.
There was a problem hiding this comment.
Yeah, it's admittedly not super great. I think the only alternatives though involving storing something to the database so clients can compare notes with one another, which is also not great for other reasons.
| // It's important for queuesIncluded to be `nil` in case it's not in use | ||
| // for the various driver queries to work correctly. |
There was a problem hiding this comment.
maybe the driver layer should handle nil and []string{} equivalently to avoid needing to deal with that concern at this level?
There was a problem hiding this comment.
Yeah, good call. Modified the drivers to do that and added tests to each query in question.
| var sub *notifier.Subscription | ||
| if e.notifier == nil { | ||
| e.Logger.DebugContext(ctx, e.Name+": No notifier configured; starting in poll mode", "client_id", e.config.ClientID) | ||
| e.Logger.DebugContext(ctx, e.Name+": Resigned leadership successfully", "client_id", e.config.ClientID, "domain", e.config.Domain) |
There was a problem hiding this comment.
was this log text altered by mistake?
| } | ||
|
|
||
| e.Logger.DebugContext(ctx, e.Name+": Current leader attempting reelect", "client_id", e.config.ClientID) | ||
| e.Logger.InfoContext(ctx, e.Name+": Current leader received forced resignation", "client_id", e.config.ClientID, "domain", e.config.Domain) |
There was a problem hiding this comment.
the text here was also changed, I think accidentally
| // another client, even in the event of a cancellation. | ||
| func (e *Elector) attemptResignLoop(ctx context.Context) { | ||
| e.Logger.DebugContext(ctx, e.Name+": Attempting to resign leadership", "client_id", e.config.ClientID) | ||
| e.Logger.InfoContext(ctx, e.Name+": Current leader received forced resignation", "client_id", e.config.ClientID, "domain", e.config.Domain) |
There was a problem hiding this comment.
I don't think this text is necessarily true, there are other reasons this could be called besides a forced resignation.
There was a problem hiding this comment.
Yeah, I think it's wrong. Reverted.
| -- name: JobDeleteBefore :execresult | ||
| DELETE FROM /* TEMPLATE: schema */river_job | ||
| WHERE | ||
| id IN ( | ||
| SELECT id | ||
| FROM /* TEMPLATE: schema */river_job | ||
| WHERE | ||
| WHERE id IN ( | ||
| SELECT id | ||
| FROM /* TEMPLATE: schema */river_job | ||
| WHERE ( | ||
| (state = 'cancelled' AND finalized_at < cast(@cancelled_finalized_at_horizon AS text)) OR | ||
| (state = 'completed' AND finalized_at < cast(@completed_finalized_at_horizon AS text)) OR | ||
| (state = 'discarded' AND finalized_at < cast(@discarded_finalized_at_horizon AS text)) | ||
| ORDER BY id | ||
| LIMIT @max | ||
| ) | ||
| -- This is really awful, but unless the `sqlc.slice` appears as the very | ||
| -- last parameter in the query things will fail if it includes more than one | ||
| -- element. The sqlc SQLite driver uses position-based placeholders (?1) for | ||
| -- most parameters, but unnamed ones with `sqlc.slice` (?), and when | ||
| -- positional parameters follow unnamed parameters great confusion is the | ||
| -- result. Making sure `sqlc.slice` is last is the only workaround I could | ||
| -- find, but it stops working if there are multiple clauses that need a | ||
| -- positional placeholder plus `sqlc.slice` like this one (the Postgres | ||
| -- driver supports a `queues_included` parameter that I couldn't support | ||
| -- here). The non-workaround version is (unfortunately) to never, ever use | ||
| -- the sqlc driver for SQLite -- it's not a little buggy, it's off the | ||
| -- charts buggy, and there's little interest from the maintainers in fixing | ||
| -- any of it. We already started using it though, so plough on. | ||
| AND ( | ||
| cast(@queues_excluded_empty AS boolean) | ||
| OR river_job.queue NOT IN (sqlc.slice('queues_excluded')) | ||
| ); | ||
| ) | ||
| AND (/* TEMPLATE_BEGIN: queues_excluded_clause */ true /* TEMPLATE_END */) | ||
| AND (/* TEMPLATE_BEGIN: queues_included_clause */ true /* TEMPLATE_END */) | ||
| ORDER BY id | ||
| LIMIT @max | ||
| ); |
There was a problem hiding this comment.
This comment applies to all drivers but I'm leaving it here because it doesn't seem like the pgx driver's JobDeleteBefore was updated in this PR yet. This query is currently targeted at the following index:
"river_job_state_and_finalized_at_index" btree (state, finalized_at) WHERE finalized_at IS NOT NULL
That will break if we're also filtering by queue, and this may turn into a sequential scan.
I haven't checked the other queries you've modified here, but this is an important consideration we'd need to be very careful of for all of them.
There was a problem hiding this comment.
@bgentry I ended up having the LLM run some minor benchmarking and checking index use. Here's what it produced:
When QueuesIncluded is NULL (the default, no LeaderDomain set): The query uses
river_job_state_and_finalized_at_index as a single index scan, exactly as before. No change in behavior.
When QueuesIncluded is a concrete array (non-default LeaderDomain): The planner switches to a BitmapOr
strategy with three legs:
- cancelled leg: uses river_job_state_and_finalized_at_index on (state, finalized_at)
- completed leg: uses river_job_prioritized_fetching_index on (state, queue)
- discarded leg: uses river_job_state_and_finalized_at_index on (state, finalized_at)
Then finalized_at and queue are applied as heap filters.
This is actually reasonable planner behavior. For the completed leg, when filtering by specific queues, the
planner correctly identifies that (state, queue) from the fetching index is more selective than (state,
finalized_at) — it narrows to both the state and the small set of queues before checking finalized_at. The
total estimated cost (21.76) is lower than the NULL-case cost (194.13).
Maybe read that over to make sure you're satisfied, but I think even with the added queues, we should still be good here.
| -- Alter `river_leader` to remove check constraint that `name` must be | ||
| -- `default`. SQLite doesn't allow schema modifications, so this redefines the | ||
| -- table entirely. |
There was a problem hiding this comment.
SQLite doesn't allow schema modifications, so this redefines the table entirely.
🤯
| updatedSQL := sql | ||
| updatedSQL = replaceTemplate(updatedSQL, templateBeginEndRE) | ||
| updatedSQL = replaceTemplate(updatedSQL, templateRE) | ||
| updatedSQL = replaceTemplate(updatedSQL, templateBeginEndRE) |
There was a problem hiding this comment.
What's going on with the reordering here?
There was a problem hiding this comment.
Just accidental again :/ Reverted.
| // will continue to run on all leaders regardless of domain. If using this | ||
| // feature, it's a good idea to configure ReindexerTimeout on all but a | ||
| // single leader domain to river.NeverSchedule(). |
There was a problem hiding this comment.
Now we are running many pods of the same worker, do we treat as "leader" a pod here or a group of workers? If we are saying that only one pod should be reindexing, then what do we do when the pod dies? Or having let's say 3 reindexer pods and many more that don't would be also alright?
3bc1c32 to
1f12a24
Compare
|
Okay, I rebased this one to make the proposal a bit more current again in pursuit of coming up with a potential solution for #1105. I had my LLM try to work some alternative solutions with me, and though it came up with a variety of alternatives, it didn't come up with anything substantially better. It would be better if you could have the client pick a value for leader domain based on each client's configured queues, but as we discussed yesterday, this doesn't work well operationally in case a queue is added or removed from a domain. It did suggest renaming |
1f12a24 to
be4d567
Compare
We've gotten a couple requests so far (see #342 and #1105) to be able to start multiple River clients targeting different queues within the same database/schema, and giving them the capacity to operate independently enough to be functional. This is currently not possible because a single leader is elected given a single schema and it handles all maintenance operations including non-queue ones like periodic job enqueuing. Here, add the idea of a `LeaderDomain`. This lets a user set the "domain" on which a client will elect its leader and allowing multiple leaders to be elected in a single schema. Each leader will run its own maintenance services. Setting `LeaderDomain` causes the additional effect of having maintenance services start to operate only on the queues that their client is configured for. The idea here is to give us backwards compatibility in that the default behavior (in case of an unset `LeaderDomain`) is the same, but providing a path for multiple leaders to be interoperable with each other. There are still a few edges: for example, reindexing is not queue specific, so multiple leaders could be running a reindexer. I've provided guidance in the config documentation that ideally, all clients but one should have their reindexer disabled.
be4d567 to
a5c8ae8
Compare
We've gotten a couple requests so far (see #342 and #1105) to be able to
start multiple River clients targeting different queues within the same
database/schema, and giving them the capacity to operate independently
enough to be functional. This is currently not possible because a single
leader is elected given a single schema and it handles all maintenance
operations including non-queue ones like periodic job enqueuing.
Here, add the idea of a
LeaderDomain. This lets a user set the"domain" on which a client will elect its leader and allowing multiple
leaders to be elected in a single schema. Each leader will run its own
maintenance services.
Setting
LeaderDomaincauses the additional effect of havingmaintenance services start to operate only on the queues that their
client is configured for. The idea here is to give us backwards
compatibility in that the default behavior (in case of an unset
LeaderDomain) is the same, but providing a path for multiple leadersto be interoperable with each other.
There are still a few edges: for example, reindexing is not queue
specific, so multiple leaders could be running a reindexer. I've
provided guidance in the config documentation that ideally, all clients
but one should have their reindexer disabled.